Two‐step clustering for data reduction combining <scp>DBSCAN</scp> and <i>k</i>‐means clustering
نویسندگان
چکیده
A novel combination of two widely-used clustering algorithms is proposed here for the detection and reduction high data density regions. The density-based spatial applications with noise (DBSCAN) algorithm used regions k-means reduction. iterates while successively decrementing DBSCAN search radius, allowing an adaptive factor based on effective density. demonstrated a physics simulation application, where surrogate model fusion reactor plasma turbulence generated neural networks. training dataset created quasilinear gyrokinetics code turbulent transport calculations in plasmas. set consists inputs derived from repository experimental measurements, meaning there potential risk over-representing specific this input parameter space. By applying to dataset, study demonstrates that can be reduced by ˜20 using algorithm, without noticeable loss accuracy. This also provides way analyzing existing high-dimensional datasets biases consequently reducing them, which lowers cost re-populating space higher quality data.
منابع مشابه
A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Data Reduction Method for Categorical Data Clustering
Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes...
متن کاملComparing and Combining Dimension Reduction Techniques for Efficient Text Clustering
A great challenge of text mining arises from the increasingly large text datasets and the high dimensionality associated with natural language. In this research, a systematic study is conducted of six Dimension Reduction Techniques (DRT) in the context of the text clustering problem using three standard benchmark datasets. The methods considered include three feature transformation techiques, I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Contributions To Plasma Physics
سال: 2023
ISSN: ['1521-3986', '0863-1042']
DOI: https://doi.org/10.1002/ctpp.202200177